2,830 research outputs found
Recommended from our members
Modelling Emotion Based Reward Valuation with Computational Reinforcement Learning
We show that computational reinforcement learning can model human decision making in the Iowa Gambling Task (IGT). The IGT is a card game, which tests decision making under uncertainty. In our experiments, we found that modulating learning rate decay in Q-learning, enables the approximation of both the behaviour of normal subjects and those who are emotionally impaired by ventromedial prefrontal lesions. Outcomes observed in impaired subjects are modeled by high learning rate decay, while low learning rate decay replicates healthy subjects under otherwise identical conditions. The ventromedial prefrontal cortex has been associated with emotion based reward valuation, and, the value function in reinforcement learning provides an analogous assessment mechanism. Thus reinforcement learning can provide a good model for the role of emotional reward as a modulator of the learning rate
Recommended from our members
Rule Value Reinforcement Learning for Cognitive Agents
RVRL (Rule Value Reinforcement Learning) is a new algorithm which extends an existing learning framework that models the environment of a situated agent using a probabilistic rule representation. The algorithm attaches values to learned rules by adapting reinforcement learning. Structure captured by the rules is used to form a policy. The resulting rule values represent the utility of taking an action if the rule`s conditions are present in the agent`s current percept. Advantages of the new framework are demonstrated, through examples in a predator-prey environment
Recommended from our members
NPCs as People, Too: The Extreme AI Personality Engine
PK Dick once asked “Do Androids Dream of Electric Sheep?” In video games, a similar question could be asked of non-player characters: Do NPCs have dreams? Can they live and change as humans do? Can NPCs have personalities, and can these develop through interactions with players, other NPCs, and the world around them? Despite advances in personality AI for games, most NPCs are still undeveloped and undeveloping, reacting with flat affect and predictable routines that make them far less than human— in fact, they become little more than bits of the scenery that give out parcels of information. This need not be the case. Extreme AI, a psychology-based personality engine, creates adaptive NPC personalities. Originally developed as part of the thesis “NPCs as People: Using Databases and Behaviour Trees to Give Non-Player Characters Personality,” Extreme AI is now a fully functioning personality engine using all thirty facets of the Five Factor model of personality and an AI system that is live throughout gameplay. This paper discusses the research leading to Extreme AI; develops the ideas found in that thesis; discusses the development of other personality engines; and provides examples of Extreme AI’s use in two game demos
Recommended from our members
Learning to Act with RVRL Agents
The use of reinforcement learning to guide action selection of cognitive agents has been shown to be a powerful technique for stochastic environments. Standard Reinforcement learning techniques used to provide decision theoretic policies rely, however, on explicit state-based computations of value for each state-action pair. This requires the computation of a number of values exponential to the number of state variables and actions in the system. This research extends existing work with an acquired probabilistic rule representation of an agent environment by developing an algorithm to apply reinforcement learning to values attached to the rules themselves. Structure captured by the rules is then used to learn a policy directly. The resulting value attached to each rule represents the utility of taking an action if the conditions of the rule are present in the agent’s current set of percepts. This has several advantages for planning purposes: generalization over many states and over unseen states; effective decisions can therefore be made with less training data than state based modelling systems (e.g. Dyna Q-Learning); and the problem of computation in an exponential state-action space is alleviated. The results of application of this algorithm to rules in a specific environment are presented, with comparison to standard reinforcement learning policies developed from related work
Recommended from our members
Approximate Dynamic Programming with Parallel Stochastic Planning Operators
This thesis presents an approximate dynamic programming (ADP) technique for environment modelling agents. The agent learns a set of parallel stochastic planning operators (P-SPOs) by evaluating changes in its environment in response to actions, using an association rule mining approach. An approximate policy is then derived by iteratively improving state value aggregation estimates attached to the operators using the P-SPOs as a model in a Dyna-Q-like architecture.
Reinforcement learning and dynamic programming are powerful techniques for automated agent decision making in stochastic environments. Dynamic programming is effective when there is a known environment model, while reinforcement learning is effective when a model is not available. The techniques derive a policy: a mapping from each environment state to an action which optimizes the long term reward the agent receives.
The standard methods become less effective as the state space for the environment increases because they require values to be associated with each state, the storage and processing of which is exponential to the number of state variables. Resolving this “curse of dimensionality” is an important topic of research amongst all communities working on this problem. Two key methods are to: (i) derive an estimate of the value (approximate dynamic programming) using function approximation or state aggregation; or (ii) build a model of the environment from experience.
This thesis presents a method of combining these approaches by exploiting structure in the state transition and value functions captured in a set of planning operators which are learnt through experience in the environment. Standard planning operators define the deterministic changes that occur in an environment in response to an action. This work presents Parallel Stochastic Planning Operators (P-SPOs), a novel form of planning operator providing a structured model of the state transition function in environments which are both non-deterministic and for which changes can occur outside the influence of actions. Next, an automated method for extracting P-SPOs from observations in an environment is explored using an adaptation of association rule mining. Finally, methods of relating the state transition structure encapsulated in the P-SPOs to state values, using the operators to store state value aggregation estimates, are evaluated.
The framework described provides a method by which approximate dynamic programming can be applied by designers of AI agents and AI planning systems for which they have minimal prior knowledge. The framework and P-SPO based implementations are tested against standard techniques in two bench-mark stochastic environments: a “slippery gripper” block painting robot; and a “predator-prey” agent environment.
Experimental results show that an agent using a P-SPO-based approach is able to learn an accurate model of its environment if successor state variables exhibit conditional independence, and an approximate model in the non-independent case. Results also demonstrate that the agent’s ability to generalise to previously unseen states using the model allow it to form an improved policy over an agent employing a standard Dyna-Q based technique. Finally, an approximate policy stored in state aggregation estimates attached to operators is shown to be optimal in experiments for which the P-SPO set contains sufficient information for effective aggregations to be formed
QL-BT: Enhancing Behaviour Tree Design and Implementation with Q-Learning
Artificial intelligence has become an increasingly important aspect of computer game technology, as designers attempt to deliver engaging experiences for players by creating characters with behavioural realism to match advances in graphics and physics. Recently, behaviour trees have come to the forefront of games AI technology, providing a more intuitive approach than previous techniques such as hierarchical state machines, which often required complex data structures producing poorly structured code when scaled up. The design and creation of behaviour trees, however, requires experienceand effort. This research introduces Q-learning behaviour trees (QL-BT), a method for the application of reinforcement learning to behaviour tree design. The technique facilitates AI designers' use of behaviour trees by assisting them in identifying the most appropriate moment to execute each branch of AI logic, as well as providing an implementation that can be used to debug, analyse and optimize early behaviour tree prototypes. Initial experiments demonstrate that behaviour trees produced by the QL-BT algorithm effectively integrate RL, automate tree design, and are human-readable
Recommended from our members
Development of a Virtual Laparoscopic Trainer using Accelerometer Augmented Tools to Assess Performance in Surgical training
Previous research suggests that virtual reality (VR) may supplement conventional training in laparoscopy. It may prove useful in the selection of surgical trainees in terms of their dexterity and spatial awareness skills in the near future. Current VR training solutions provide levels of realism and in some instances, haptic feedback, but they are cumbersome by being tethered and not ergonomically close to the actual surgical instruments for weight and freedom of use factors. In addition, they are expensive hence making them less accessible to departments than conventional box trainers. The box trainers in comparison, although more economical, lack tangible feedback and realism for handling delicate tissue structures. We have previously reported on the development of a modified digitally enhanced surgical instrument for laparoscopic training, named the Parkar Tool. This tool contains wireless accelerometer and gyroscopic sensors integrated into actual laparoscopic instruments. By design, it alleviates the need for both tethered and physically different shaped tools thereby enhancing the realism when performing surgical procedures. Additionally the software (Valhalla) has the ability to digitally record surgical motions, thereby enabling it to remotely capture surgical training data to analyse and objectively evaluate performance. We have adapted and further developed our initial single training tool method as used with a laparoscopic pyloromyotomy scenario, to an enhanced method using multiple Parkar wireless tools simultaneously, for use in several different case scenarios. This allows the use and measurement of right and left handed dexterity with the benefit of using several tasks of differing complexity. The development of a 3D tissue-surface deformations solution written in OpenGL gives us several different virtual surgical training scenario approximations to use with the instruments. The trainee can start with learning simple tasks e.g. incising tissue, grasping, squeezing and stretching tissue, to more complex procedures such as suturing, herniotomies, bowel anastomoses, as well as the original pyloromyotomy as used in the first model
Recommended from our members
Implementing Racing AI using Q-Learning and Steering Behaviours
Artificial intelligence has become a fundamental component of modern computer games as developers are producing ever more realistic experiences. This is particularly true of the racing game genre in which AI plays a fundamental role. Reinforcement learning (RL) techniques, notably Q-Learning (QL), have been growing as feasible methods for implementing AI in racing games in recent years. The focus of this research is on implementing QL to create a policy which the AI agents to utilise in a racing game using the Unity 3D game engine. QL is used (offline) to teach the agent appropriate throttle values around each part of the circuit whilst the steering is handled using a predefined racing line. Two variations of the QL algorithm were implemented to examine their effectiveness. The agents also make use of Steering Behaviours (including obstacle avoidance) to ensure that they can adapt their movements in real-time against other agents and players. Initial experiments showed that both types performed well and produced competitive lap times when compared to a player
Approximate Dynamic Programming with Parallel Stochastic Planning Operators
This report presents an approximate dynamic programming (ADP) technique for environment modelling agents. The agent learns a set of parallel stochastic planning operators (P-SPOs) by evaluating changes in its environment in response to actions, using an association rule mining approach. An approximate policy is then derived by iteratively improving state value aggregation estimates attached to the operators using the P-SPOs as a model in a Dyna-Q-like architecture. Reinforcement learning and dynamic programming are powerful techniques for automated agent decision making in stochastic environments. Dynamic programming is effective when there is a known environment model, while reinforcement learning is effective when a model is not available. The techniques derive a policy: a mapping from each environment state to an action which optimizes the long term reward the agent receives. The standard methods become less effective as the state space for the environment increases because they require values to be associated with each state, the storage and processing of which is exponential to the number of state variables. Resolving this “curse of dimensionality” is an important topic of research amongst all communities working on this problem. Two key methods are to: (i) derive an estimate of the value (approximate dynamic programming) using function approximation or state aggregation; or (ii) build a model of the environment from experience. This report presents a method of combining these approaches by exploiting structure in the state transition and value functions captured in a set of planning operators which are learnt through experience in the environment. Standard planning operators define the deterministic changes that occur in an environment in response to an action. This work presents Parallel Stochastic Planning Operators (P-SPOs), a novel form of planning operator providing a structured model of the state transition function in environments which are both non-deterministic and for which changes can occur outside the influence of actions. Next, an automated method for extracting P-SPOs from observations in an environment is explored using an adaptation of association rule mining. Finally, methods of relating the state transition structure encapsulated in the P-SPOs to state values, using the operators to store state value aggregation estimates, are evaluated. The framework described provides a method by which approximate dynamic programming can be applied by designers of AI agents and AI planning systems for which they have minimal prior knowledge. The framework and P-SPO based implementations are tested against standard techniques in two bench-mark stochastic environments: a “slippery gripper” block painting robot; and a “predator-prey” agent environment. Experimental results show that an agent using a P-SPO-based approach is able to learn an accurate model of its environment if successor state variables exhibit conditional independence, and an approximate model in the non-independent case. Results also demonstrate that the agent’s ability to generalise to previously unseen states using the model allow it to form an improved policy over an agent employing a standard Dyna-Q based technique. Finally, an approximate policy stored in state aggregation estimates attached to operators is shown to be optimal in experiments for which the P-SPO set contains sufficient information for effective aggregations to be formed
Recommended from our members
Conditional Regressive Random Forest Stereo-based Hand Depth Recovery
This paper introduces Conditional Regressive Random Forest (CRRF), a novel method that combines a closed-form Conditional Random Field (CRF), using learned weights, and a Regressive Random Forest (RRF) that employs adaptively selected expert trees. CRRF is used to estimate a depth image of hand given stereo RGB inputs. CRRF uses a novel superpixel-based regression framework that takes advantage of the smoothness of the hand’s depth surface. A RRF unary term adaptively selects different stereo-matching measures as it implicitly determines matching pixels in a coarse-to-fine manner. CRRF also includes a pair-wise term that encourages smoothness between similar adjacent superpixels. Experimental results show that CRRF can produce high quality depth maps, even using an inexpensive RGB stereo camera and produces state-of-the-art results for hand depth estimation
- …